Presence-absence reasoning for evolutionary phenotypes
نویسندگان
چکیده
Nearly invariably, phenotypes are reported in the scientific literature in meticulous detail, utilizing the full expressivity of natural language. Both detail and expressivity are usually driven by study-specific research questions. However, research aiming to synthesize or integrate phenotype data across studies or even disciplines is often faced with the need to abstract from detailed observations so as to construct phenotypic concepts that are common across many datasets rather than specific to a few. Yet, observations or facts that would fall under such abstracted concepts are typically not directly asserted by the original authors, usually because they are “obvious” according to common domain knowledge, and thus asserting them would be deemed redundant by anyone with sufficient domain experience. For example, a phenotype describing the length of a manual digit for an organism implicitly means that the organism must have had a hand, and thus a forelimb. In this way, the presence or absence of a forelimb may have supporting data across a far wider range of taxa than the length of a particular manual digit, and may also have wider applications in biological research questions. For large-scale computational integration of phenotypes the challenge then is, how can machines be enabled to infer such facts that are implied by but not explicitly included in the phenotype observations recorded by the original author(s). As descriptions in natural language, phenotype data require special transformation to become amenable to computational processing to start with. An approach with considerable success in rendering phenotypes computable is to annotate the free text descriptions with ontology terms drawn from anatomy, quality, spatial, taxonomy and other pertinent ontologies, following a common formalism. The aforementioned challenge then is specifically, how can a machine reasoner be enabled to infer implied phenotypes from those asserted, given the anatomy (and other) domain knowledge asserted by ontology axioms in subclass, partonomy, and other hierarchies. Here we describe how within the Phenoscape project we use a pipeline of axiom generation and inference steps to address this challenge specifically for inferring taxonspecific presence/absence of anatomical entities from anatomical phenotypes. These phenotypes are primarily derived from published comparative anatomical treatments (descriptions of new species or reviews of larger clade interrelationships) in the form of morphological character state matrices, which document for a set of characters the evolutionary patterns of variation (the character states) across a set of taxa (Dahdul et al. 2010). Using the Phenex data annotation tool (Balhoff et al. 2010), Phenoscape curators annotate each character state using the Entity– Quality (EQ) formalism (Mungall et al. 2007, 2010). Anatomical entities are represented by terms from the comprehensive Uberon anatomy ontology for metazoan animals (Haendel et al. 2014), qualities (e.g., presence/absence, size, shape, composition, color, etc.) are drawn from the Phenotype and Trait (PATO) ontology (Gkoutos et al. 2005), and terms for vertebrate taxa are taken from the Vertebrate Taxonomy Ontology (VTO) (Midford et al. 2013). The Phenoscape Knowledgebase (KB, http://phenoscape.org/) is essentially a triple store that integrates such ontology-annotated phenotype data across all studies and data sources and allows querying them. Although presence/absence is all but one, and a seemingly simple way to abstract phenotypes across data sources, it can nonetheless be powerful for linking genotype to phenotype (Hiller et al. 2012), and it is particularly relevant for constructing synthetic morphological supermatrices for comparative analysis; in fact presence/absence is one of the prevailing character observation types in published character matrices, accounting for 25-50% of data in some large morphological matrices (Sereno 2009).
منابع مشابه
Toward Synthesizing Our Knowledge of Morphology: Using Ontologies and Machine Reasoning to Extract Presence/Absence Evolutionary Phenotypes across Studies
The reality of larger and larger molecular databases and the need to integrate data scalably have presented a major challenge for the use of phenotypic data. Morphology is currently primarily described in discrete publications, entrenched in noncomputer readable text, and requires enormous investments of time and resources to integrate across large numbers of taxa and studies. Here we present a...
متن کاملAre Within-sex Mating Strategy Phenotypes an Evolutionary Stable Strategy?
Humans have been found to display considerable variety in their pursuit of mating strategies, varying in their preference for short-term mating encounters versus established long-term relationships. While we know that differences in mating strategy exist between the two sexes (as predicted by parental investment theory), it has recently been shown that each sex may further exhibit two mating ph...
متن کاملAppraisal of the evolutionary-based methodologies in generation of artificial earthquake time histories
Through the last three decades different seismological and engineering approaches for the generation of artificial earthquakes have been proposed. Selection of an appropriate method for the generation of applicable artificial earthquake accelerograms (AEAs) has been a challenging subject in the time history analysis of the structures in the case of the absence of sufficient recorded accelerogra...
متن کاملEvolutionary game theory elucidates the role of glycolysis in glioma progression and invasion.
OBJECTIVES Tumour progression has been described as a sequence of traits or phenotypes that cells have to acquire if the neoplasm is to become an invasive and malignant cancer. Although genetic mutations that lead to these phenotypes are random, the process by which some of these mutations become successful and cells spread is influenced by tumour microenvironment and the presence of other cell...
متن کاملEvolutionary dynamics of giant viruses and their virophages
Giant viruses contain large genomes, encode many proteins atypical for viruses, replicate in large viral factories, and tend to infect protists. The giant virus replication factories can in turn be infected by so called virophages, which are smaller viruses that negatively impact giant virus replication. An example is Mimiviruses that infect the protist Acanthamoeba and that are themselves infe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1410.3862 شماره
صفحات -
تاریخ انتشار 2014